Advanced Acoustic Modeling with the Hybrid HMM/BN Framework
نویسندگان
چکیده
Most of the current state-of-the-art speech recognition systems are based on HMMs which usually use mixture of Gaussian functions as state probability distribution model. It is a common practice to use EM algorithm for Gaussian mixture parameter learning. In this case, the learning is done in a ”blind”, data-driven way without taking into account how the speech signal has been produced and which factors it depends on. In this paper, we describe the hybrid HMM/BN acoustic modeling framework, where, in contract to the conventional mixture of Gaussians, HMM state probability distribution is modeled by a Bayesian Network, hence the name is HMM/BN. Temporal speech characteristics are still governed by the HMM state transitions, but the state output likelihood is inferred from the BN. This allows for very flexible and consistent models of the state probability distributions which can easily integrate different speech parameterizations. BN can represent various speech features and environment conditions and their underlying dependencies. We show that the conventional HMM is a special case of HMM/BN model which we regard as a generalization of the HMM. The HMM/BN parameter learning is based on the Viterbi training paradigm and consists of two alternating steps BN training and HMM transition probabilities update. For recognition, in some cases, BN inference is computationally equivalent to mixture of Gaussians which allows HMM/BN model to be used in existing HMM decoders. We present several examples of HMM/BN model application in speech recognition systems. Evaluations under various conditions and for different tasks showed that the HMM/BN model gives consistently better performance that the standard mixture of Gaussians HMM.
منابع مشابه
A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency
The most widely used acoustic unit in current automatic speech recognition systems is the triphone, which includes the immediate preceding and following phonetic contexts. Although triphones have proved to be an efficient choice, it is believed that they are insufficient in capturing all of the coarticulation effects. A wider phonetic context seems to be more appropriate, but often suffers from...
متن کاملUsing Hybrid HMM/BN Acoustic Models: Design and Implementation Issues
In recent years, the number of studies investigating new directions in speech modeling that goes beyond the conventional HMM has increased considerably. One promising approach is to use Bayesian Networks (BN) as speech models. Full recognition systems based on Dynamic BN as well as acoustic models using BN have been proposed lately. Our group at ATR has been developing a hybrid HMM/BN model, wh...
متن کاملA Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition
In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Bayesian Networks (BN) allow for easy combination of different continuous as well as discrete features by exploring conditional dependencies between them. However, the lack of efficient algorit...
متن کاملHybrid HMM/BN LVCSR system integrating multiple acoustic features
In current HMM based speech recognition systems, it is difficult to supplement acoustic spectrum features with additional information such as pitch, gender, articulator positions, etc. On the other hand, Dynamic Bayesian Networks (DBN) allow for easy combination of different features and make use of conditional dependencies between them. However, lack of efficient algorithms has prevented their...
متن کاملIntegration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework
Most of the current state-of-the-art speech recognition systems are based on speech signal parametrizations that crudely model the behavior of the human auditory system. However, little or no use is usually made of the knowledge on the human speech production system. A data-driven statistical approach to incorporate this knowledge into ASR would require a substantial amount of data, which are n...
متن کامل